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l CROSS REFERENCE TO RELATED APPLICATION 

2 

3 This is a division of United States Patent Application No. 09/040,149, 

4 filed on March 17, 1998. 

5 

6 BACKGROUND 

7 1. Technical Field 

8 The present invention relates generally to apparatus and methods for 

9 transmitting signals between nodes, and more particularly for transmitting 
10 signals at high bit-rates between nodes. 

g li 

=fj 12 2. Background 

M* 13 A typical computer or inter-linked set of computers can be modeled as 

fin 

^ 14 a series of nodes which communicate with one another point-to-point. 

* 15 Although nodes have in the past been attached to a bus, modern 

fy 16 communications standards more commonly employ point-to-point 

P 17 interconnections. In the past, the communication data rate between such 

O 18 nodes was limited more by the performance of the computer or its various 

19 internal chips than by the speed of the traces or transmission lines by which 

20 the nodes were connected. Now as chip and computer speeds have 

21 substantially increased, these interconnects are hindering further 

22 performance improvements. 
23 

24 More specifically, current signaling architectures as well as the physical 

25 limitations of the traces and transmission lines themselves limit the 

26 maximum inter-node communications rate. Synchronized buses and 

27 point-to-point links are two of interconnect architectures commonly used. 

28 Synchronous bus architectures typically broadcast an address block to all 

29 nodes on a multiplexed bus. The node corresponding to the address then 

30 generates an acknowledgement block, which is also broadcast to all of the 

31 nodes. This architecture results in relatively low communications 
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throughput. This is because each node must be synchronized to the same 
common reference clock so that address and acknowledgment blocks can be 
transmitted and received. Nodes employing the synchronous bus 
architecture must also take turns communicating, further limiting the 
maximum possible inter-node data rate, especially when a large number of 
nodes are connected to the same bus. The multiplexed buses used with 
synchronous bus architectures also typically contain intermediate stubs and 
additional signal paths that limit the effective speed of data transfer between 
nodes. 

Point-to-point link architectures are comparatively more time efficient 
since their signals have an affiliated reference clock signal. Using a 
point-to-point link architecture, two nodes can transfer data at a rate 
independent of other nodes and of any common reference clock. However, 
point-to-point link inter-node data rates still tend to be limited by the physical 
limitations in the traces and transmission lines. 

For instance, modern computers typically communicate by either 
single- ended or differential signaling. Both of these forms of signaling are 
well known in the art. Ideally, single-ended connections require only one 
physical line per logic signal. However, as communication rates have 
increased so has ground-bounce, which is inherent in single-ended systems. 
Attempts to solve the ground-bounce problem include adding power supply 
and ground pins for each single-ended logic line on a chip, effectively tripling 
the number of physical traces required. Thus, six single-ended logic signals 
can require up to eighteen physical traces. Differential signaling systems 
require two physical traces for each logic signal. Thus, six differential logic 
signals require at least twelve physical traces. Since silicon and computer 
resources are finite, a large number of traces or transmission lines can 
significantly increase the cost of manufacturing the chip or the computer. 
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1 Regardless of whether single-ended or differential signaling is used, the 

2 physical traces and transmission lines all have an inherent parasitic 

3 inductance. As the data rate over these pathways increases, the parasitic 

4 inductance combined with the quickly varying signal currents generate 

5 parasitic voltages that interfere with and corrupt the signals traveling over 

6 these pathways. 

7 

8 Additionally, large signal currents that pass through the traces and 

9 transmission lines can generate Electro-Magnetic Interference (EMI) noise 
10 which further corrupts signals traveling between the nodes. Such EMI noise 
n may also, from time to time, exceed the limits of various well known 

£ 12 regulatory standards for permissible EMI radiation levels, 

w 1 ^ 

e 

CP 14 Other prior art approaches have employed RAMBUS technology 

EP 

a 15 (manufactured by RAMBUS, Inc. of Mountain View, California) to reduce the 

L. 16 parasitic and EMI noise voltages present on some signal lines. The RAMBUS 

W 17 approach consists of a number of traces or transmission lines, each of which 

g 18 transmits a different signal. Ideally, these signal lines are kept in close 

^ 19 proximity to one another. One of the signal lines is designated as a reference 

20 and used to cancel out some of the noise effects present on the signal lines. A 

21 shortcoming of this approach is a noticeable current surge when all of the 

22 signal lines are either logic l's or logic O's. 
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SUMMARY 
f 

The present invention delineates an inter-node communications 
paradigm for enabling signals to be transmitted between nodes at a higher 
rate. The higher rate is possible due to an encoding schema that reduces 
current demands and fluctuations between multiple nodes. The encoding 
schema also requires fewer physical traces and/or transmission lines than 
high speed single-ended and differential signaling circuits. 

Within the apparatus of the present invention, a first node is 
connected to a communication channel. Operations on the first node result 
in a first set of signals that are to be transmitted over the communication 
channel. The logic states which comprise the first set of signals may range 
from all logic zeros to all logic ones. This large number of potential logic 
transitions results in large current fluctuations over the communication 
channel. To reduce and/or eliminate these current fluctuations, the present 
invention also includes an encoder or lookup table for transforming the first 
set of signals into a second set of signals having either an equal number, 
nearly an equal number, a constant number, or nearly a constant number of 
logic ones and logic zeros. In one embodiment, groups of six signals from the 
first set of signals are encoded into eight signals in the second set of signals. 

Within the method of the present invention, a first set of signals from 
a first node are encoded into a second set of signals having either an equal 
number, nearly an equal number, a constant number, or nearly a constant 
number of logic ones and logic zeros. This second set of signals is then 
transmitted over a communication channel. 

Thus, the present invention presents a communications technique 
which has quieter switching currents than single-ended circuits and requires 
fewer physical traces and transmission lines than differential circuits. The 
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1 present invention can be applied to communications between computer chips 

2 on a circuit board as well as between nearby computers linked together. These 

3 and other aspects of the invention will be recognized by those skilled in the 

4 art upon review of the detailed description, drawings, and claims set forth 

5 below. 
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1 BRIEF DESCRIPTION OF THE DRAWINGS 

2 

3 Figure 1 is a block diagram of an apparatus for inter-node 

4 communication; 

5 

6 Figure 2 is a table of unencoded signals utilized by each node within 

7 the apparatus of Figure 1; 
8 

9 Figure 3 is a block diagram of an apparatus for encoding and 

10 transmitting signals between a set of nodes; 

Q ii 

□ 

£ 12 Figure 4 is a partial electrical circuit of an encoder and decoder pair 

2 13 within the apparatus for encoding and transmitting the signals between the 

^ 14 set of nodes; 

m 

15 

fT § 16 Figure 5 is a flowchart of a method for inter-node communication; and 

y 17 

18 Figure 6 is a flowchart of a method for encoding inter-node 

19 communication signals. 
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1 DETAILED DESCRIPTION 

2 

3 Figure 1 is a block diagram of an apparatus 100 for transmitting high 

4 bit-rate signals between a set of nodes. The apparatus 100 includes a first node 

5 102, a second node 104, a third node 106, a fourth node 108 and a clock 

6 generator 110. The apparatus is designed to provide the functionality of a 

7 synchronous bus, with the performance advantages of point-to-point 

8 signaling. Each node 102, 104, 106 and 108 receives a clock signal on line 112 

9 from the clock generator 110, data signals on lines 114 A, B, C and D, flag 
10 signals on lines 116 A, B, C and D, and strobe signals on lines 118 A, B, C and 
n D. Collectively, the signals on lines 114, 116, and 118 make up a shared 

~ 12 communications channel/link between the nodes 102, 104, 106 and 108. 

r 

£ 13 Those skilled in the art will be aware of. various other receive and transmit 

S 

u 14 signals which can also be passed between the nodes 102, 104, 106 and 108, 

^ 15 depending upon the application or transmission protocol. 

* 16 

fy 17 Data transmissions between nodes 102, 104, 106 and 108 are source 

18 synchronous. Even though all of the nodes also receive the clock signal on 

□ 19 line 112, the strobe signals on lines 118 A, B, C and D act as a reference clock 

20 for these data transmissions. The strobe signal resolves any frequency 

21 differences in timing between the transmitting and receiving node. 

22 Although each node has to compensate for arbitrary frequency differences 

23 between its incoming data and the clock signal on line 112, this arrangement 

24 eliminates the need to insert or delete symbols to adjust for inter-node timing 

25 differences. Strobe signals also enable higher transmission bandwidths 

26 between the nodes because there is no longer a need to accurately synchronize 

27 all of the nodes within the apparatus 100. The inter-node transmission rates 

28 are not affected by inter-node transmission delays, thus arbitrary delays 

29 between each of the nodes are acceptable. 
30 
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1 Using the point-to-point link data communication architecture, the 

2 nodes transmit requests and responses as send packets. The receiving node 

3 strips the send packet of its data and then returns a small acknowledgment 

4 (i.e. "ack") packet. Routing architectures for the packets are preferably 

5 straightforward. In the preferred embodiment, the send packets addressed to 

6 a specific node are stripped from the inter-node data stream flowing on lines 

7 114 A, B, C, and D by that node and are replaced with ack packets. The send 

8 and ack packets addressed to other nodes are passed through any intermediate 

9 nodes. The ack packets are stripped from the inter-node data stream when 
10 they return to the original transmitting node. 



£ ii 



O 12 For example, in order for node 102 to transmit data to node 108, node 

q 13 102 must first generate a send packet addressed to node 108. Node 102 then 

£ 14 transmits this send packet to node 106. Node 106 looks at the address of the 

CP 15 send packet and since the send packet is not intended for node 106, passes the 

s 

16 send packet along to node 108. Node 108 then looks at the address of the send 

jjj 17 packet and since the send packet is intended for node 108, strips the send 

H 18 packet from the inter-node data stream and generates an ack packet addressed 

0 

y, 19 to node 102. Node 108 then sends the ack packet to node 104. Node 104 looks 

20 at the address of the ack packet and since the ack packet is not intended for 

21 node 104, passes the ack packet along to node 102. Node 102 then looks at the 

22 address of the ack packet and since the ack packet is intended for node 102, 

23 strips the ack packet from the inter-node data stream. After this last step the 

24 data exchange between nodes 102 and 108 is complete. 
25 

26 The flags on lines 116 A, B, C, and D are used for data stream packet- 

27 framing. Data stream packet-framing consists of labeling each packet with 

28 either a first-packet symbol, a between-packet-idle symbol, or a last-packet 

29 symbol. These flags are also used for arbitration purposes. Arbitration 

30 protocols place a limit on a number of consecutive packets that can be sent by 
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1 any one node. This ensures that each node has an opportunity to send its 

2 packets over the shared link. 

3 

4 Preferably one, and only one, of the nodes 102, 104, 106, 108 functions as 

5 a "scrubber" within the apparatus 100. The scrubber performs such 

6 maintenance functions as, removing mis-addressed packets on their second 

7 pass around the shared link. The scrubber can be selected by setting a scrubld 

8 line 120 A, B, C or D on the selected node to logic "1" (node 104 in Figure 1) 

9 and setting the remaining scrubld lines 120 A, B, C or D to logic "0" (nodes 102, 
10 106 and 108 in Figure 1). Those skilled in the art will know of other vendor- 

s s 

Z 1 1 dependent scrubber-assignment techniques that can be used. 

M 12 

r- 

p 13 Figure 2 is a table 200 of unencoded signals utilized by each node 

J!! 14 within the apparatus of Figure 1. The signals include a 1-pin scrubber 

£h 15 identifier (scrubld) 202, a 1-pin central clock (clock) 204, a 1-pin strobe 206, 

16 4-pins of flags 208, and 32-pins of data. 

HI 

18 The scrubld signal 202 is received by the nodes on lines 120 A, B, C, and 

□ 

^ 19 D, and uniquely identifies a particular node as the scrubber within the shared 

20 communications channel (also known by those skilled in the art as a ringlet). 

21 In an alternate embodiment, a vendor-dependent scrubber-identification 

22 technique can be provided. 

23 

24 The clock signal 204 is preferably an input-only signal received by the 

25 nodes on line 112, and provides the nodes with a reference for synchronizing 

26 their internal frequency-lock loops. The frequency-lock-loops within the 

27 nodes permit a lower clock generator 110 frequency, which in turn simplifies 

28 clock signal distribution. For example, a 50 MHz clock signal can be 

29 frequency-lock-looped up to 500 MHz. Also since the frequency-lock loops 

30 within the nodes are tracking the clock signal fairly closely, the cycles per 

31 second each node sees is about the same. 
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1 

2 Frequency locking, rather than phase locking, is used because arbitrary, 

3 but fixed, phase differences can be tolerated and a small input-data FIFO can 

4 compensate for any incoming phase differences. The FIFO compensates for 

5 the fixed phase error between each node. The fixed phase error is random 

6 and cannot be controlled. 

7 

8 The strobe signal 206, received by the nodes on lines 118 A, B, C, and D, 

9 is preferably complemented on each cycle, so that a receiving node can 

10 accurately determine when incoming (source-synchronous) data should be 

11 latched. 
12 

D 

£ 13 The flag signals 208, received by the nodes on lines 116 A, B, C, and D, 

O 

14 transfer control and framing information between the nodes. 

5 15 

gi 

= 16 The data signals 210, received by the nodes on lines 114 A, B, C, and D, 

K- 

fy 17 transmit the contents of the send and ack packets. Preferably, most-significant 

*rj 18 bits are sent first when sending a packet header, and lower addresses are sent 

Q 19 first when sending data. While the present inter-node data packets contain 32 

M> 

20 data bits, other data packet capacities, such as 8, 16, 64, or 128 data bits, are also 

21 acceptable. Alternatively, large data-words may be broken up and sent over 

22 multiple clock cycles. For instance a large 64 bit data packet could be sent as 

23 two smaller 32 bit data packets. When deciding how many bits to include in a 

24 data packet, designers should consider that while 8-bit data packets may be 

25 more cost effective in terms of coding or hardware to implement, such small 

26 data packet designs have less bandwidth per pin due to the relatively-fixed 

27 overhead of the scrubld, clock, strobe, and flag pins. However, while 128-bit 

28 data packets may have more bandwidth per pin, such large data packets may 

29 be more expensive to implement in terms of coding, hardware, and/or 

30 skew-management circuits. 
31 
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Figure 3 is a block diagram of an apparatus 300 for encoding and 
transmitting the signals 206, 208, and 210 between a set of nodes, the 
apparatus includes a driver node 302 and a receiver node 304 which can 
represent any pair of the Figure 1 nodes 102, 104, 106, and 108. The driver 
node 302 includes a plurality of encoders 302 A-G and the receiver node 304 
includes a plurality of decoders 304 A-G. The nodes 302 and 304 communicate 
via a set of traces, circuit paths, and/ or transmission lines 308. These traces, 
circuit paths, and transmission lines 308 collectively make up part of the 
communications channel between the two nodes and perform the same roles 
as the data 114, flag 116, and strobe 118 lines of Figure 1. The nodes 
themselves can include any number of inter-linked devices. These 
inter-linked devices can be computer chips, circuit boards, and/or stand alone 
computers. 

The apparatus 300 transmits the signals between the nodes by either a 
dedicated line (such as for the scrubld signals on lines 120 A-D and the clock 
signal on line 112) or after implementing an encoding schema (such as for the 
strobe, flags and data signals). By selectively encoding the strobe, flag and data 
signals transmitted between the nodes 302, 304, ground-bounce during 
high-speed signaling, can be reduced. Preferably the strobe, flag and data 
signals are grouped and encoded in such a way that nearly an equal (called 
DC-free encoding) and/or constant (called DC-balanced encoding) number of 
logic 0's and l's are always transmitted between the driver node 302 and the 
receiver node 304. These encoding schemas are preferably implemented 
using an even number of data lines 308. 

One way to implement a DC-balanced encoding schema is shown in 
Figure 3. The strobe 206 signal is fed into a 1 to 2 encoder 302A which uses a 
complementary encoding schema before the signal is transmitted over the 
data lines 308. The flag 208 and data 210 signals, however, are divided into 
groups of six unencoded signals (i.e. 6-bits), which are then converted into 
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groups of eight encoded signals (i.e. 8-bits). These eight encoded signals are 
then transmitted in parallel over the data lines 308. Other signal coding 
schemas, such as when groups of four unencoded signals (i.e. 4-bits) are 
converted into groups of six encoded signals (i.e. 6-bits) ; can also be used. 

DC-free encoding schemas can be implemented by transmitting an 
even number of encoded signals with an equal number of logic l's and O's in 
parallel over the data lines 308. 

The DC-Balanced and DC-Free encoding schemas have the following 
characteristics: First, a constant number of l's valued lines is always driven 
and thus the driver node 302 is balanced. A balanced driver node means that 
total current over the data lines 308 is fairly constant. Second, ground-bounce 
noise is reduced, since logic transitions from all zeros to all ones and from all 
ones to all zeros are eliminated. Third, an implied reference voltage can be 
obtained by averaging all of the data line 308 voltages. Fourth, parity 
protection is inherent, since all single-bit transmissions failures, and many 
double-bit errors, can be detected as an illegal (i.e. non-DC-balanced) input 
code value. Fifth, peak current demands are reduced, since for each of the 
unencoded signals, only half of the data lines are actively driven to a high 
current logic state (such as logic "1"). Sixth, the encoding schema allows extra 
control characters to be transmitted (for instance, in a 6-to-8 bit encoding 
schema there are 6 unmapped 8-bit encoded values for each set of 64 
unencoded values). 

A nearly DC-free /nearly DC-balanced encoding schema can be 
implemented by transmitting an odd number of bits, which contain no more 
than one extra logic 1 or logic 0, in parallel over the data lines 308. For 
instance, a 6/7 encoding schema (i.e. where 6-bits are encoded into 7-bits) may 
be used where a logic 1 to logic 0 ratio is either 3-to-4 or 4-to-3. Although 
more efficient, this nearly free/balanced encoding schema has no parity 
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1 protection and may be subject to signal-integrity limitations. This encoding 

2 schema also results in a less accurate threshold reference voltage, once all of 

3 the received signal values are averaged by the decoders 304 A-G. 

4 

5 The encoding schema herein taught requires extra-pinouts (for 

6 example, the 6-to-8 schema requires an additional 2-pins for every 6 signals). 

7 However, these additional demands upon limited silicon and PC-board 

8 resources are offset by a reduction in a number of required ground and /or 

9 power pins when compared with current unencoded full-swing signals 
10 transmitted over single-ended data lines. To localize any chip or PC-board 

M ll ground-plane currents, each set of 8 signal traces is preferably routed as a 

O 

fk 12 group. While the 6-to-8 bit encoding schema is preferred, sometimes the 

£ 13 number of unencoded signal lines are not always multiples of 6. In those 

D 

U 14 cases other encoding options are possible, such as l-to-2 (differential), 2-to-4 

ffi 

m 15 (also differential), and 4-to-6 encoding schemas. 

16 

PJ 17 Figure 4 is a partial electrical circuit 400 of the encoder 302 A and the 

Id 

£T 18 decoder 304 A pair within the apparatus 300 for encoding and transmitting the 

Q 19 signals between the set of nodes. The circuit 400, shown in Figure 4, effects a 

20 6-to-8 bit encoding schema which employs a simple technology-independent 

21 technique for driving the data lines 308 by switching a constant current 402 

22 (i.e. "4i") to half (i.e. 4) of the total number (i.e. 8) of data lines 308 which are 

23 driven to a high current logic state lines. Only half of the data lines are 

24 driven since a DC-free encoding schema is effected. A zero volt termination 

25 voltage 404, rather than a fixed positive voltage, enables the circuit to be 

26 supply-voltage independent. In the circuit 400, termination resistors, R t , are 

27 effected using on chip FETs. The value of R t is matched to the impedance of 

28 each data-line 308. A typical termination resistance is 75£2. Averaging 

29 resistors, R c , are chosen so that the threshold reference voltage of the decoder 

30 304A is one half of the driven signal levels on the data lines 308. The decoder 

31 304 A then uses differential amplifiers 406 to compare the signals on the 
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1 data-lines 308 with the threshold reference voltage to determine whether, for 

2 example, a logic "1" or a logic "0" has been transmitted. 

3 

4 Portions of the encoding and decoding hardware described in Figure 3 

5 may also be implemented in software. In so doing, additional configurable 

6 elements, such as a processing unit, a memory, and a storage device (not 

7 shown) need to be added to Figures 1, 3, and 4. The memory would store 

8 computer program instructions for controlling how the processing unit 

9 accesses, transforms and outputs data. Those skilled in the art will recognize 
10 that in alternate embodiments the memory could be replaced with a 

M= 11 functionally equivalent computer-useable medium such as a compact disk, a 

g 12 hard drive or a memory card. 

£ 13 

M 14 Figure 5 is a flowchart of a method for inter-node communication. 

15 The method begins in step 502 where a driver node and a receiver node are 

f 16 identified within a communications system. Next, in step 504, a set of 

jy 17 unencoded signals transmitted from the driver node to the receiver node are 

Id 

72 18 identified. The signals include a set of data values (i.e. logic "l's" and logic 

p 19 "0's"). In step 506, the unencoded data values of the unencoded signals are 

20 encoded to produce encoded signals. Next, in step 508, the encoded signals are 

21 transmitted from the driver node to the receiver node. In step 510, the 

22 encoded signals are decoded. After step 510, the method for inter-node 

23 communication is complete. 

24 

25 Figure 6 is a flowchart of a method for encoding the data values of the 

26 inter-node communication signals (step 506 of Figure 5). The method begins 

27 in step 602 by selecting a code such that a difference between a total number of 

28 unencoded data values and a total number of encoded data values is a small 

29 predetermined fraction of the total number of unencoded data values. A 6-bit 

30 unencoded signal to 8-bit encoded signal coding schema is preferred, however 

31 a 4-bit unencoded signal to 6-bit encoded signal coding schema may be more 
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1 appropriate in some applications. Codes may be stored in a set of lookup 

2 tables or generated using algorithmic methods. Codes may also be switched at 

3 any point during signal transmission provided that the switch is 

4 communicated to the receiver node. In step 604, the code is selected such that 

5 an encoded signal has an equal number of logic l's and O's. Alternatively, in 

6 step 606 the code is selected such that an encoded signal has nearly an equal 

7 number of logic l's and O's. Alternatively, in step 608 the code is selected such 

8 that an encoded signal has a constant number of logic l's and O's. 

9 Alternatively, in step 610 the code is selected such that an encoded signal has 

10 nearly a constant number of logic l's and O's. After step 610, the method for 

11 encoding the inter-node communication signals is complete. 

U 12 

+» 

□ 13 While the present invention has been described with reference to a preferred 



14 embodiment, those skilled in the art will recognize that various 



B 1 15 modifications may be made. Variations upon and modifications to the 

\^ 16 preferred embodiment are provided by the present invention, which is 

hj 

j s "| 17 limited only by the following claims. 

p= 

u 
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